Top-k Item Identification on Dynamic and Distributed Datasets

نویسندگان

  • Alessio Guerrieri
  • Alberto Montresor
  • Yannis Velegrakis
چکیده

The problem of identifying the most frequent items across multiple datasets has received considerable attention over the last few years. When storage is a scarce resource, the topic is already a challenge; yet, its complexity may be further exacerbated not only by the many independent data sources, but also by the dynamism of the data, i.e., the fact that new items may appear and old ones disappear at any time. In this work, we provide a novel approach to the problem by using an existing gossip-based algorithm for identifying the k most frequent items over a distributed collection of datasets, in ways that deal with the dynamic nature of the data. The algorithm has been thoroughly analyzed through trace-based simulations and compared to state-of-the-art decentralized solutions, showing better precision at reduced communication overhead.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble-based Top-k Recommender System Considering Incomplete Data

Recommender systems have been widely used in e-commerce applications. They are a subclass of information filtering system, used to either predict whether a user will prefer an item (prediction problem) or identify a set of k items that will be user-interest (Top-k recommendation problem). Demanding sufficient ratings to make robust predictions and suggesting qualified recommendations are two si...

متن کامل

Extracting Support Based k most Strongly Correlated Item Pairs in Large Transaction Databases

Support confidence framework is misleading in finding statistically meaningful relationships in market basket data. The alternative is to find strongly correlated item pairs from the basket data. However, strongly correlated pairs query suffered from suitable threshold setting problem. To overcome that, top-k pairs finding problem has been introduced. Most of the existing techniques are multi-p...

متن کامل

Retrieval of the most relevant facts from data streams joined with slowly evolving dataset published on the Web of Data

Finding the most relevant facts among dynamic and heterogeneous data published on the Web of Data is getting a growing attention in recent years. RDF Stream Processing (RSP) engines offer a baseline solution to integrate and process streaming data with data distributed on the Web. Unfortunately, the time to access and fetch the distributed data can be so high to put the RSP engine at risk of lo...

متن کامل

A Boosting Algorithm for Item Recommendation with Implicit Feedback

Many recommendation tasks are formulated as top-N item recommendation problems based on users’ implicit feedback instead of explicit feedback. Here explicit feedback refers to users’ ratings to items while implicit feedback is derived from users’ interactions with items, e.g., number of times a user plays a song. In this paper, we propose a boosting algorithm named AdaBPR (Adaptive Boosting Per...

متن کامل

Entropy-based Scheduling Policy for Cross Aggregate Ranking Workloads

Many data exploration applications require the ability to identify the top-k results according to a scoring function. We study a class of top-k ranking problems where top-k candidates in a dataset are scored with the assistance of another set. We call this class of workloads cross aggregate ranking. Example computation problems include evaluating the Hausdorff distance between two datasets, fin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014